Cobain GIC
Overview:
Overview:
This page contains the results of CoNGA analyses.
Results in tables may have been filtered to reduce redundancy,
focus on the most important columns, and
limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.
Command:
scripts/run_conga.py --graph_vs_graph --graph_vs_features --find_hotspot_features --tcr_clumping --match_to_tcr_database --gex_data filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file tmp_rhesus_clones.tsv --organism human --outfile_prefix tmp_rhesus
Stats
num_cells_w_gex: 15491
num_features_start: 26530
num_cells_w_tcr: 1418
min_genes_per_cell: 200
max_genes_per_cell: 3500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 91
num_filt_max_percent_mito: 0
num_antibody_features: 0
num_TR_genes: 43
num_TR_genes_in_hvg_set: 36
num_highly_variable_genes: 2088
num_cells_after_filtering: 1327
num_clonotypes: 1022
max_clonotype_size: 64
num_singleton_clonotypes: 942
nbr_frac_for_nndists: 0.01
num_gvg_hit_clonotypes: 78
num_gvg_hit_biclusters: 5
graph_vs_graph
Graph vs graph analysis looks for correlation between GEX and TCR space
by finding statistically significant overlap between two similarity graphs,
one defined by GEX similarity and one by TCR sequence similarity.
Overlap is defined one node (clonotype) at a time by looking for overlap
between that node's neighbors in the GEX graph and its neighbors in the
TCR graph. The null model is that the two neighbor sets are chosen
independently at random.
CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where
K = neighborhood size is specified as a fraction of the number of
clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where
each clonotype is connected to all the other clonotypes in the same
(GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN,
GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the
K values (called nbr_fracs short for neighbor fractions).
Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster):
conga_score = P value for GEX/TCR overlap * number of clonotypes
mait_fraction = fraction of the overlap made up of 'invariant' T cells
num_neighbors* = size of neighborhood (K)
cluster_size = size of cluster (for KNN v cluster graph overlaps)
clone_index = 0-index of clonotype in adata object
| conga_score |
num_neighbors_gex |
num_neighbors_tcr |
overlap |
overlap_corrected |
mait_fraction |
clone_index |
nbr_frac |
graph_overlap_type |
cluster_size |
gex_cluster |
tcr_cluster |
va |
ja |
cdr3a |
vb |
jb |
cdr3b |
| 1.362550e-08 |
NaN |
10.0 |
10 |
9 |
1.000000 |
27 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQGPGGASVLTF |
| 1.261473e-07 |
NaN |
10.0 |
10 |
8 |
1.000000 |
35 |
0.01 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV24-1*01 |
TRBJ2-6*01 |
CATSSDSSGASVLTF |
| 1.261473e-07 |
NaN |
10.0 |
10 |
8 |
1.000000 |
33 |
0.01 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-2*01 |
TRBJ2-6*01 |
CASSVDSDSGASVLTF |
| 2.060094e-07 |
NaN |
10.0 |
10 |
8 |
1.000000 |
31 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDQNYQLIW |
TRBV4-2*01 |
TRBJ2-6*01 |
CASSQDQEGTGASVLTF |
| 1.572245e-06 |
NaN |
10.0 |
9 |
8 |
1.000000 |
28 |
0.01 |
gex_cluster_vs_tcr_nbr |
66.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-6*01 |
CASSQVPGTGASVLTF |
| 1.996482e-06 |
NaN |
10.0 |
9 |
8 |
1.000000 |
22 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAALDSNYQLIW |
TRBV4-3*01 |
TRBJ1-2*01 |
CASSQGGAPDYDYTF |
| 1.996482e-06 |
NaN |
10.0 |
9 |
8 |
1.000000 |
41 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVWDSNYQLIW |
TRBV4-3*01 |
TRBJ2-2*01 |
CASSQDWGEPGAQLFF |
| 6.715313e-06 |
NaN |
10.0 |
8 |
8 |
1.000000 |
36 |
0.01 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-2*01 |
CASSVRGETAQLFF |
| 1.763080e-05 |
NaN |
10.0 |
9 |
7 |
1.000000 |
38 |
0.01 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-2*01 |
TRBJ2-6*01 |
CASSEAASGASVLTF |
| 2.688248e-05 |
NaN |
10.0 |
9 |
7 |
1.000000 |
23 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAAMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-2*01 |
CASSQAMGEHGAQLFF |
| 2.688248e-05 |
NaN |
10.0 |
9 |
7 |
1.000000 |
32 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDRDYQLIW |
TRBV19*01 |
TRBJ2-6*01 |
CASSSGNSGASVLTF |
| 6.491113e-05 |
NaN |
102.0 |
23 |
21 |
0.652174 |
34 |
0.10 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-3*01 |
TRBJ1-5*01 |
CASSEGGEVNQPQYF |
| 6.491113e-05 |
NaN |
102.0 |
23 |
21 |
0.652174 |
38 |
0.10 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-2*01 |
TRBJ2-6*01 |
CASSEAASGASVLTF |
| 1.042252e-04 |
NaN |
10.0 |
8 |
7 |
1.000000 |
29 |
0.01 |
gex_cluster_vs_tcr_nbr |
66.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQDIGGSSGASVLTF |
| 1.599297e-04 |
NaN |
102.0 |
26 |
20 |
0.730769 |
32 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDRDYQLIW |
TRBV19*01 |
TRBJ2-6*01 |
CASSSGNSGASVLTF |
| 3.012966e-04 |
NaN |
10.0 |
7 |
7 |
1.000000 |
34 |
0.01 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-3*01 |
TRBJ1-5*01 |
CASSEGGEVNQPQYF |
| 3.019490e-04 |
NaN |
10.0 |
9 |
6 |
1.000000 |
30 |
0.01 |
gex_cluster_vs_tcr_nbr |
66.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV6-3*01 |
TRBJ2-6*01 |
CASNMRHSGASVLTF |
| 3.611443e-04 |
NaN |
102.0 |
22 |
20 |
0.681818 |
33 |
0.10 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-2*01 |
TRBJ2-6*01 |
CASSVDSDSGASVLTF |
| 3.621572e-04 |
NaN |
10.0 |
9 |
6 |
1.000000 |
26 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAPRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-6*01 |
CASSEGYSGASVLTF |
| 3.621572e-04 |
NaN |
10.0 |
9 |
6 |
1.000000 |
25 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAPMDSNYQLIW |
TRBV10-1*01 |
TRBJ2-6*01 |
CASSWDNSGASVLTF |
| 3.621572e-04 |
NaN |
10.0 |
9 |
6 |
1.000000 |
39 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-1*01 |
TRBJ2-6*01 |
CASSDGGESGASVLTF |
| 6.274256e-04 |
NaN |
102.0 |
38 |
38 |
0.000000 |
567 |
0.10 |
gex_cluster_vs_tcr_nbr |
182.0 |
2 |
6 |
TRAV27*01 |
TRAJ53*01 |
CAGAYSGSSNYKLTF |
TRBV20-1*01 |
TRBJ2-2*01 |
CSARRRTNTAQLFF |
| 8.686281e-04 |
NaN |
102.0 |
25 |
19 |
0.760000 |
39 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-1*01 |
TRBJ2-6*01 |
CASSDGGESGASVLTF |
| 8.686281e-04 |
NaN |
102.0 |
25 |
19 |
0.760000 |
25 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAPMDSNYQLIW |
TRBV10-1*01 |
TRBJ2-6*01 |
CASSWDNSGASVLTF |
| 8.686281e-04 |
NaN |
102.0 |
25 |
19 |
0.760000 |
27 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQGPGGASVLTF |
| 1.163187e-03 |
10.0 |
NaN |
6 |
6 |
1.000000 |
36 |
0.01 |
gex_nbr_vs_tcr_cluster |
46.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-2*01 |
CASSVRGETAQLFF |
| 1.847839e-03 |
NaN |
102.0 |
21 |
19 |
0.714286 |
35 |
0.10 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV24-1*01 |
TRBJ2-6*01 |
CATSSDSSGASVLTF |
| 1.980417e-03 |
102.0 |
NaN |
26 |
26 |
0.000000 |
634 |
0.10 |
gex_nbr_vs_tcr_cluster |
104.0 |
5 |
3 |
TRAV38-1*01 |
TRAJ43*01 |
CAFMKENNDIRF |
TRBV9*01 |
TRBJ1-3*01 |
CASSLGQESGNTVYF |
| 4.247237e-03 |
102.0 |
NaN |
20 |
20 |
0.000000 |
218 |
0.10 |
gex_nbr_vs_tcr_cluster |
70.0 |
2 |
6 |
TRAV13-1*01 |
TRAJ47*01 |
CAAIFYGNKLIF |
TRBV20-1*01 |
TRBJ1-3*01 |
CSALNGGSGNTVYF |
| 4.329108e-03 |
NaN |
102.0 |
24 |
18 |
0.791667 |
18 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ20*01 |
CAVRDRDYKLSF |
TRBV4-3*01 |
TRBJ2-4*01 |
CASSQDLGGSDTQYF |
| 4.329108e-03 |
NaN |
102.0 |
24 |
18 |
0.791667 |
26 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAPRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-6*01 |
CASSEGYSGASVLTF |
| 4.552048e-03 |
102.0 |
NaN |
21 |
15 |
0.904762 |
24 |
0.10 |
gex_nbr_vs_tcr_cluster |
51.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAFMDSNYQLIW |
TRBV6-1*01 |
TRBJ2-3*01 |
CASSGTGDTDPQYF |
| 4.715481e-03 |
NaN |
10.0 |
7 |
6 |
1.000000 |
18 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ20*01 |
CAVRDRDYKLSF |
TRBV4-3*01 |
TRBJ2-4*01 |
CASSQDLGGSDTQYF |
| 6.117287e-03 |
NaN |
102.0 |
22 |
18 |
0.772727 |
30 |
0.10 |
gex_cluster_vs_tcr_nbr |
66.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV6-3*01 |
TRBJ2-6*01 |
CASNMRHSGASVLTF |
| 6.117287e-03 |
NaN |
102.0 |
22 |
18 |
0.772727 |
29 |
0.10 |
gex_cluster_vs_tcr_nbr |
66.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQDIGGSSGASVLTF |
| 8.682594e-03 |
NaN |
102.0 |
20 |
18 |
0.750000 |
36 |
0.10 |
gex_cluster_vs_tcr_nbr |
64.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-2*01 |
CASSVRGETAQLFF |
| 1.067573e-02 |
10.0 |
NaN |
7 |
5 |
0.857143 |
27 |
0.01 |
gex_nbr_vs_tcr_cluster |
51.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQGPGGASVLTF |
| 1.122430e-02 |
102.0 |
NaN |
32 |
32 |
0.000000 |
906 |
0.10 |
gex_nbr_vs_tcr_cluster |
156.0 |
0 |
1 |
TRAV8-3*01 |
TRAJ8*01 |
CAVSERNTGFQKLVF |
TRBV23-1*01 |
TRBJ2-7*01 |
CASSPQGEYEQYF |
| 1.976548e-02 |
NaN |
102.0 |
23 |
17 |
0.826087 |
41 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVWDSNYQLIW |
TRBV4-3*01 |
TRBJ2-2*01 |
CASSQDWGEPGAQLFF |
| 1.976548e-02 |
NaN |
102.0 |
23 |
17 |
0.826087 |
23 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAAMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-2*01 |
CASSQAMGEHGAQLFF |
| 1.976548e-02 |
NaN |
102.0 |
23 |
17 |
0.826087 |
31 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDQNYQLIW |
TRBV4-2*01 |
TRBJ2-6*01 |
CASSQDQEGTGASVLTF |
| 1.976548e-02 |
NaN |
102.0 |
23 |
17 |
0.826087 |
40 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVTDSNYQLIW |
TRBV6-1*01 |
TRBJ1-2*01 |
CASSDWDSNYDYTF |
| 2.563956e-02 |
10.0 |
NaN |
6 |
5 |
1.000000 |
23 |
0.01 |
gex_nbr_vs_tcr_cluster |
51.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAAMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-2*01 |
CASSQAMGEHGAQLFF |
| 2.713711e-02 |
NaN |
102.0 |
21 |
17 |
0.809524 |
28 |
0.10 |
gex_cluster_vs_tcr_nbr |
66.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-6*01 |
CASSQVPGTGASVLTF |
| 3.553112e-02 |
NaN |
102.0 |
48 |
48 |
0.000000 |
825 |
0.10 |
gex_cluster_vs_tcr_nbr |
296.0 |
0 |
1 |
TRAV8-2*01 |
TRAJ29*01 |
CAVNVSGNRALVF |
TRBV9*01 |
TRBJ2-1*01 |
CASSYRGWGDNEQFF |
| 4.789739e-02 |
NaN |
10.0 |
7 |
5 |
1.000000 |
40 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAVTDSNYQLIW |
TRBV6-1*01 |
TRBJ1-2*01 |
CASSDWDSNYDYTF |
| 4.789739e-02 |
NaN |
10.0 |
7 |
5 |
1.000000 |
24 |
0.01 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAFMDSNYQLIW |
TRBV6-1*01 |
TRBJ2-3*01 |
CASSGTGDTDPQYF |
| 5.452475e-02 |
10.0 |
NaN |
5 |
5 |
1.000000 |
24 |
0.01 |
gex_nbr_vs_tcr_cluster |
51.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ33*01 |
CAFMDSNYQLIW |
TRBV6-1*01 |
TRBJ2-3*01 |
CASSGTGDTDPQYF |
| 5.815087e-02 |
102.0 |
102.0 |
23 |
23 |
0.000000 |
368 |
0.10 |
gex_nbr_vs_tcr_nbr |
NaN |
6 |
3 |
TRAV19*01 |
TRAJ43*01 |
CALIQYNNNDIRF |
TRBV6-1*01 |
TRBJ2-3*01 |
CASSDFGLGQGYPQYF |
| 6.007283e-02 |
NaN |
102.0 |
20 |
17 |
0.700000 |
17 |
0.10 |
gex_cluster_vs_tcr_nbr |
68.0 |
4 |
8 |
TRAV1-2*01 |
TRAJ12*01 |
CAVRDPGDGGYKLIF |
TRBV7-2*01 |
TRBJ2-7*01 |
CASSPSWSGSYAEQYF |
Omitted 70 lines
graph_vs_graph_logos
This figure summarizes the results of a CoNGA analysis that produces
scores (CoNGA) and clusters. At the top are six
2D UMAP projections of clonotypes in the dataset based on GEX similarity
(top left three panels) and TCR similarity (top right three panels),
colored from left to right by GEX cluster assignment;
CoNGA score; joint GEX:TCR cluster assignment for
clonotypes with significant CoNGA scores,
using a bicolored disk whose left half indicates GEX cluster and whose right
half indicates TCR cluster; TCR cluster; CoNGA; GEX:TCR cluster
assignments for CoNGA hits, as in the third panel.
Below are two rows of GEX landscape plots colored by (first row, left)
expression of selected marker genes, (second row, left) Z-score normalized and
GEX-neighborhood averaged expression of the same marker genes, and
(both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for
TCR feature descriptions).
GEX and TCR sequence features of CoNGA hits in clusters with
5 or more hits are summarized by a series
of logo-style visualizations, from left to right:
differentially expressed genes (DEGs); TCR sequence logos showing the V and
J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased
TCR sequence scores, with red indicating elevated scores and blue indicating
decreased scores relative to the rest of the dataset (see CoNGA manuscript
Table S3 for score definitions); GEX 'logos' for each cluster
consisting of a panel of marker genes shown with red disks colored by
mean expression and sized according to the fraction of cells expressing
the gene (gene names are given above).
DEG and TCRseq sequence logos are scaled
by the adjusted P value of the associations, with full logo height requiring
a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown
in gray. Each cluster is indicated by a bicolored disk colored according to
GEX cluster (left half) and TCR cluster (right half). The two numbers above
each disk show the number of hits within the cluster (on the left) and
the total number of cells in those clonotypes (on the right). The dendrogram
at the left shows similarity relationships among the clusters based on
connections in the GEX and TCR neighbor graphs.
The choice of which marker genes to use for the GEX umap panels and for the
cluster GEX logos can be configured using run_conga.py command line flags
or arguments to the conga.plotting.make_logo_plots function.
Image source: tmp_rhesus_graph_vs_graph_logos.png
tcr_clumping
This table stores the results of the TCR "clumping"
analysis, which looks for neighborhoods in TCR space with more TCRs than
expected by chance under a simple null model of VDJ rearrangement.
For each TCR in the dataset, we count how many TCRs are within a set of
fixed TCRdist radii (defaults: 24,48,72,96), and compare that number
to the expected number given the size of the dataset using the poisson
model. Inspired by the ALICE and TCRnet methods.
Columns:
clump_type='global' unless we are optionally looking for TCR clumps within
the individual GEX clusters
num_nbrs = neighborhood size (number of other TCRs with TCRdist
| clump_type |
clone_index |
nbr_radius |
pvalue_adj |
num_nbrs |
expected_num_nbrs |
raw_count |
va |
ja |
cdr3a |
vb |
jb |
cdr3b |
clonotype_fdr_value |
clumping_group |
clusters_gex |
clusters_tcr |
| global |
25 |
96 |
2.966006e-13 |
7 |
0.016757 |
41032.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAPMDSNYQLIW |
TRBV10-1*01 |
TRBJ2-6*01 |
CASSWDNSGASVLTF |
2.966006e-13 |
1 |
4 |
8 |
| global |
39 |
96 |
7.272156e-13 |
7 |
0.019053 |
46654.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-1*01 |
TRBJ2-6*01 |
CASSDGGESGASVLTF |
3.636078e-13 |
1 |
4 |
8 |
| global |
26 |
96 |
1.311137e-12 |
9 |
0.079370 |
194344.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAPRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-6*01 |
CASSEGYSGASVLTF |
4.370455e-13 |
1 |
4 |
8 |
| global |
697 |
24 |
4.369259e-12 |
4 |
0.000400 |
980.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CAIAGTNYGEQFF |
1.092315e-12 |
2 |
3 |
5 |
| global |
692 |
24 |
1.352794e-11 |
4 |
0.000531 |
1300.0 |
TRAV4*01 |
TRAJ33*01 |
CLGGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
696 |
24 |
1.352794e-11 |
4 |
0.000531 |
1300.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
32 |
96 |
2.862912e-09 |
5 |
0.009674 |
23687.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDRDYQLIW |
TRBV19*01 |
TRBJ2-6*01 |
CASSSGNSGASVLTF |
4.089874e-10 |
1 |
4 |
8 |
| global |
698 |
24 |
1.332229e-08 |
3 |
0.000269 |
661.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSATNYGEQFF |
1.665286e-09 |
2 |
1 |
5 |
| global |
697 |
48 |
1.965991e-08 |
4 |
0.003280 |
8031.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CAIAGTNYGEQFF |
1.092315e-12 |
2 |
3 |
5 |
| global |
696 |
48 |
6.778152e-08 |
4 |
0.004470 |
10946.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
692 |
48 |
6.778152e-08 |
4 |
0.004470 |
10946.0 |
TRAV4*01 |
TRAJ33*01 |
CLGGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
699 |
24 |
1.013257e-07 |
3 |
0.000530 |
1300.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
8.443809e-09 |
2 |
3 |
5 |
| global |
35 |
96 |
8.392636e-06 |
4 |
0.014943 |
36770.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV24-1*01 |
TRBJ2-6*01 |
CATSSDSSGASVLTF |
6.455874e-07 |
1 |
4 |
8 |
| global |
698 |
48 |
1.147078e-05 |
3 |
0.002565 |
6292.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSATNYGEQFF |
1.665286e-09 |
2 |
1 |
5 |
| global |
697 |
72 |
3.917167e-05 |
4 |
0.021995 |
53857.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CAIAGTNYGEQFF |
1.092315e-12 |
2 |
3 |
5 |
| global |
38 |
96 |
3.962047e-05 |
5 |
0.065747 |
161778.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-2*01 |
TRBJ2-6*01 |
CASSEAASGASVLTF |
2.476280e-06 |
1 |
4 |
8 |
| global |
30 |
96 |
4.531022e-05 |
4 |
0.022814 |
55972.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV6-3*01 |
TRBJ2-6*01 |
CASNMRHSGASVLTF |
2.665307e-06 |
1 |
4 |
8 |
| global |
699 |
48 |
6.030807e-05 |
3 |
0.004462 |
10946.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
8.443809e-09 |
2 |
3 |
5 |
| global |
696 |
72 |
1.350910e-04 |
4 |
0.030022 |
73511.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
692 |
72 |
1.350910e-04 |
4 |
0.030022 |
73511.0 |
TRAV4*01 |
TRAJ33*01 |
CLGGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
27 |
96 |
9.163108e-04 |
4 |
0.048630 |
119075.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQGPGGASVLTF |
4.363385e-05 |
1 |
4 |
8 |
| global |
26 |
72 |
1.032063e-03 |
3 |
0.011518 |
28202.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAPRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-6*01 |
CASSEGYSGASVLTF |
4.370455e-13 |
1 |
4 |
8 |
| global |
28 |
96 |
3.820143e-03 |
3 |
0.017845 |
43780.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-6*01 |
CASSQVPGTGASVLTF |
1.660932e-04 |
1 |
4 |
8 |
| global |
528 |
24 |
5.008615e-03 |
1 |
0.000001 |
3.0 |
TRAV27*01 |
TRAJ17*01 |
CAGEEVASNKLTF |
TRBV21-1*01 |
TRBJ1-5*01 |
CASSKGQGDQPQYF |
2.086923e-04 |
5 |
3 |
4 |
| global |
698 |
72 |
5.386327e-03 |
3 |
0.020021 |
49119.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSATNYGEQFF |
1.665286e-09 |
2 |
1 |
5 |
| global |
28 |
72 |
1.117498e-02 |
2 |
0.002340 |
5741.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-6*01 |
CASSQVPGTGASVLTF |
1.660932e-04 |
1 |
4 |
8 |
| global |
39 |
72 |
1.784486e-02 |
2 |
0.002958 |
7242.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-1*01 |
TRBJ2-6*01 |
CASSDGGESGASVLTF |
3.636078e-13 |
1 |
4 |
8 |
| global |
699 |
72 |
1.792119e-02 |
3 |
0.029963 |
73511.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
8.443809e-09 |
2 |
3 |
5 |
| global |
620 |
24 |
2.170395e-02 |
1 |
0.000005 |
13.0 |
TRAV35*01 |
TRAJ58*01 |
CAGQRQTGGSRLTF |
TRBV14*01 |
TRBJ1-2*01 |
CASSQGYDYTF |
7.234651e-04 |
6 |
0 |
4 |
| global |
621 |
24 |
2.170395e-02 |
1 |
0.000005 |
13.0 |
TRAV35*01 |
TRAJ58*01 |
CAGQRQTGGSRLTF |
TRBV14*01 |
TRBJ1-2*01 |
CASSQGYDYTF |
7.234651e-04 |
6 |
0 |
4 |
| global |
697 |
96 |
2.312312e-02 |
4 |
0.110345 |
270189.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CAIAGTNYGEQFF |
1.092315e-12 |
2 |
3 |
5 |
| global |
529 |
24 |
4.173827e-02 |
1 |
0.000010 |
25.0 |
TRAV27*01 |
TRAJ17*01 |
CAGEGVASNKLTF |
TRBV21-1*01 |
TRBJ1-5*01 |
CASSSGQGDQPQYF |
1.304321e-03 |
5 |
1 |
4 |
| global |
268 |
24 |
5.008587e-02 |
1 |
0.000012 |
30.0 |
TRAV17*01 |
TRAJ27*01 |
CATDANADKLTF |
TRBV19*01 |
TRBJ2-4*01 |
CASGQGGQNTQYF |
1.517754e-03 |
4 |
3 |
2 |
| global |
269 |
24 |
6.678102e-02 |
1 |
0.000016 |
40.0 |
TRAV17*01 |
TRAJ27*01 |
CATDTNADKLTF |
TRBV19*01 |
TRBJ2-4*01 |
CASGQGGQNTQYF |
1.964148e-03 |
4 |
3 |
2 |
| global |
31 |
96 |
8.341992e-02 |
2 |
0.006402 |
15676.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDQNYQLIW |
TRBV4-2*01 |
TRBJ2-6*01 |
CASSQDQEGTGASVLTF |
2.372477e-03 |
1 |
4 |
8 |
| global |
692 |
96 |
8.972842e-02 |
4 |
0.156296 |
382703.0 |
TRAV4*01 |
TRAJ33*01 |
CLGGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
696 |
96 |
8.972842e-02 |
4 |
0.156296 |
382703.0 |
TRAV4*01 |
TRAJ33*01 |
CLVGDSNYQLIW |
TRBV10-1*01 |
TRBJ2-1*01 |
CASSGTNYGEQFF |
2.254656e-12 |
2 |
3 |
5 |
| global |
70 |
24 |
9.015412e-02 |
1 |
0.000022 |
54.0 |
TRAV12-1*01 |
TRAJ36*01 |
CAVRTGVNNLFF |
TRBV23-1*01 |
TRBJ2-7*01 |
CASSQTGTGSYEQYF |
2.372477e-03 |
3 |
6 |
0 |
| global |
27 |
72 |
9.566851e-02 |
2 |
0.006857 |
16790.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQGPGGASVLTF |
4.363385e-05 |
1 |
4 |
8 |
| global |
71 |
24 |
1.068491e-01 |
1 |
0.000026 |
64.0 |
TRAV12-1*01 |
TRAJ36*01 |
CAVRTGVNNLFF |
TRBV23-1*01 |
TRBJ2-7*01 |
CASSSTGTGSYEQYF |
2.671228e-03 |
3 |
6 |
0 |
| global |
37 |
96 |
1.629249e-01 |
3 |
0.063054 |
155152.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-1*01 |
TRBJ2-6*01 |
CASSEARTGASVLTF |
3.973778e-03 |
1 |
2 |
8 |
| global |
38 |
72 |
1.743967e-01 |
2 |
0.009266 |
22799.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV6-2*01 |
TRBJ2-6*01 |
CASSEAASGASVLTF |
2.476280e-06 |
1 |
4 |
8 |
| global |
29 |
96 |
2.466204e-01 |
2 |
0.011025 |
27048.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVMDSNYQLIW |
TRBV4-3*01 |
TRBJ2-6*01 |
CASSQDIGGSSGASVLTF |
5.735359e-03 |
1 |
4 |
8 |
| global |
621 |
48 |
3.138613e-01 |
1 |
0.000077 |
188.0 |
TRAV35*01 |
TRAJ58*01 |
CAGQRQTGGSRLTF |
TRBV14*01 |
TRBJ1-2*01 |
CASSQGYDYTF |
7.234651e-04 |
6 |
0 |
4 |
| global |
620 |
48 |
3.138613e-01 |
1 |
0.000077 |
188.0 |
TRAV35*01 |
TRAJ58*01 |
CAGQRQTGGSRLTF |
TRBV14*01 |
TRBJ1-2*01 |
CASSQGYDYTF |
7.234651e-04 |
6 |
0 |
4 |
| global |
528 |
48 |
3.222084e-01 |
1 |
0.000079 |
193.0 |
TRAV27*01 |
TRAJ17*01 |
CAGEEVASNKLTF |
TRBV21-1*01 |
TRBJ1-5*01 |
CASSKGQGDQPQYF |
2.086923e-04 |
5 |
3 |
4 |
| global |
33 |
96 |
4.254261e-01 |
2 |
0.014497 |
35671.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAVRDSNYQLIW |
TRBV10-2*01 |
TRBJ2-6*01 |
CASSVDSDSGASVLTF |
8.912654e-03 |
1 |
4 |
8 |
| global |
23 |
96 |
4.278074e-01 |
2 |
0.014537 |
35596.0 |
TRAV1-2*01 |
TRAJ33*01 |
CAAMDSNYQLIW |
TRBV4-2*01 |
TRBJ2-2*01 |
CASSQAMGEHGAQLFF |
8.912654e-03 |
1 |
4 |
8 |
| global |
529 |
48 |
6.610841e-01 |
1 |
0.000162 |
396.0 |
TRAV27*01 |
TRAJ17*01 |
CAGEGVASNKLTF |
TRBV21-1*01 |
TRBJ1-5*01 |
CASSSGQGDQPQYF |
1.304321e-03 |
5 |
1 |
4 |
| global |
268 |
48 |
8.580531e-01 |
1 |
0.000210 |
514.0 |
TRAV17*01 |
TRAJ27*01 |
CATDANADKLTF |
TRBV19*01 |
TRBJ2-4*01 |
CASGQGGQNTQYF |
1.517754e-03 |
4 |
3 |
2 |
tcr_clumping_logos
This figure summarizes the results of a CoNGA analysis that produces
scores (TCR clumping) and clusters. At the top are six
2D UMAP projections of clonotypes in the dataset based on GEX similarity
(top left three panels) and TCR similarity (top right three panels),
colored from left to right by GEX cluster assignment;
TCR clumping score; joint GEX:TCR cluster assignment for
clonotypes with significant TCR clumping scores,
using a bicolored disk whose left half indicates GEX cluster and whose right
half indicates TCR cluster; TCR cluster; TCR clumping; GEX:TCR cluster
assignments for TCR clumping hits, as in the third panel.
Below are two rows of GEX landscape plots colored by (first row, left)
expression of selected marker genes, (second row, left) Z-score normalized and
GEX-neighborhood averaged expression of the same marker genes, and
(both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for
TCR feature descriptions).
GEX and TCR sequence features of TCR clumping hits in clusters with
3 or more hits are summarized by a series
of logo-style visualizations, from left to right:
differentially expressed genes (DEGs); TCR sequence logos showing the V and
J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased
TCR sequence scores, with red indicating elevated scores and blue indicating
decreased scores relative to the rest of the dataset (see CoNGA manuscript
Table S3 for score definitions); GEX 'logos' for each cluster
consisting of a panel of marker genes shown with red disks colored by
mean expression and sized according to the fraction of cells expressing
the gene (gene names are given above).
DEG and TCRseq sequence logos are scaled
by the adjusted P value of the associations, with full logo height requiring
a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown
in gray. Each cluster is indicated by a bicolored disk colored according to
GEX cluster (left half) and TCR cluster (right half). The two numbers above
each disk show the number of hits within the cluster (on the left) and
the total number of cells in those clonotypes (on the right). The dendrogram
at the left shows similarity relationships among the clusters based on
connections in the GEX and TCR neighbor graphs.
The choice of which marker genes to use for the GEX umap panels and for the
cluster GEX logos can be configured using run_conga.py command line flags
or arguments to the conga.plotting.make_logo_plots function.
Image source: tmp_rhesus_tcr_clumping_logos.png
tcr_db_match
This table stores significant matches between
TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv
P values of matches are assigned by turning the raw TCRdist
score into a P value based on a model of the V(D)J rearrangement
process, so matches between TCRs that are very far from germline
(for example) are assigned a higher significance.
Columns:
tcrdist: TCRdist distance between the two TCRs (adata query and db hit)
pvalue_adj: raw P value of the match * num query TCRs * num db TCRs
fdr_value: Benjamini-Hochberg FDR value for match
clone_index: index within adata of the query TCR clonotype
db_index: index of the hit in the database being matched
va,ja,cdr3a,vb,jb,cdr3b
db_XXX: where XXX is a field in the literature database
tcr_graph_vs_gex_features
This table has results from a graph-vs-features analysis in which we
look for genes that are differentially expressed (elevated) in specific
neighborhoods of the TCR neighbor graph. Differential expression is
assessed by a ttest first, for speed, and then
by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value
passes an initial threshold (default is 10* the pvalue threshold).
Each row of the table represents a single significant association, in other
words a neighborhood (defined by the central clonotype index) and a
gene.
The columns are as follows:
ttest_pvalue_adj= ttest_pvalue * number of comparisons
mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons
log2enr = log2 fold change of gene in neighborhood (will be positive)
gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores
tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores
num_fg= the number of clonotypes in the neighborhood (including center)
mean_fg= the mean value of the feature in the neighborhood
mean_bg= the mean value of the feature outside the neighborhood
feature= the name of the gene
mait_fraction= the fraction of the skewed clonotypes that have an invariant
TCR
clone_index= the index in the anndata dataset of the clonotype that is the
center of the neighborhood.
| ttest_pvalue_adj |
mwu_pvalue_adj |
log2enr |
gex_cluster |
tcr_cluster |
feature |
mean_fg |
mean_bg |
num_fg |
clone_index |
mait_fraction |
nbr_frac |
graph_type |
feature_type |
| 2.892767e-19 |
3.397924e-74 |
5.760394 |
2 |
6 |
ENSMMUG00000043894 |
2.432636 |
0.175342 |
71 |
-1 |
0.0 |
0.0 |
tcr_cluster |
gex |
| 1.089014e-08 |
1.064123e-60 |
5.712788 |
0 |
10 |
ENSMMUG00000061119 |
1.753271 |
0.087108 |
36 |
-1 |
0.0 |
0.0 |
tcr_cluster |
gex |
| 1.364521e-10 |
8.285322e-47 |
4.522135 |
2 |
6 |
ENSMMUG00000043894 |
1.703045 |
0.178513 |
103 |
415 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 7.391521e-09 |
5.327159e-41 |
4.325845 |
2 |
6 |
ENSMMUG00000043894 |
1.631611 |
0.186519 |
103 |
107 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 8.861353e-09 |
2.750179e-39 |
4.315207 |
2 |
6 |
ENSMMUG00000043894 |
1.627734 |
0.186954 |
103 |
626 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.066617e-08 |
6.673187e-39 |
4.273937 |
2 |
6 |
ENSMMUG00000043894 |
1.612688 |
0.188640 |
103 |
248 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.683860e-01 |
6.759251e-36 |
6.061486 |
0 |
1 |
ENSMMUG00000060662 |
0.797168 |
0.018091 |
103 |
924 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 5.602167e-08 |
2.577416e-35 |
4.175139 |
2 |
6 |
ENSMMUG00000043894 |
1.576652 |
0.192679 |
103 |
301 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 2.262062e-07 |
6.309623e-35 |
4.071092 |
2 |
6 |
ENSMMUG00000043894 |
1.538696 |
0.196933 |
103 |
59 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.756006e-07 |
1.146475e-34 |
4.027714 |
2 |
6 |
ENSMMUG00000043894 |
1.522876 |
0.198706 |
103 |
178 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.151267e-07 |
1.506979e-34 |
4.235354 |
2 |
6 |
ENSMMUG00000043894 |
1.598617 |
0.190217 |
103 |
785 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.537118e-07 |
8.457291e-34 |
4.123988 |
2 |
6 |
ENSMMUG00000043894 |
1.557992 |
0.194770 |
103 |
482 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.188034e-07 |
4.091844e-33 |
4.003964 |
2 |
6 |
ENSMMUG00000043894 |
1.514216 |
0.199677 |
103 |
479 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.532596e-07 |
4.468555e-33 |
4.001371 |
2 |
6 |
ENSMMUG00000043894 |
1.513271 |
0.199783 |
103 |
418 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.167149e-07 |
5.078867e-33 |
4.003478 |
2 |
6 |
ENSMMUG00000043894 |
1.514039 |
0.199697 |
103 |
432 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 6.089620e-07 |
7.975956e-33 |
3.970755 |
2 |
6 |
ENSMMUG00000043894 |
1.502111 |
0.201033 |
103 |
946 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 5.613666e-07 |
1.155395e-32 |
3.947235 |
2 |
6 |
ENSMMUG00000043894 |
1.493540 |
0.201994 |
103 |
101 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 8.191486e-01 |
1.204519e-32 |
5.719125 |
0 |
1 |
ENSMMUG00000060662 |
0.765785 |
0.021609 |
103 |
923 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.365246e-07 |
2.064042e-32 |
4.113990 |
2 |
6 |
ENSMMUG00000043894 |
1.554344 |
0.195179 |
103 |
758 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.440516e-06 |
1.180600e-30 |
3.804360 |
2 |
6 |
ENSMMUG00000043894 |
1.441542 |
0.207822 |
103 |
994 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.419424e+00 |
2.054940e-30 |
4.640636 |
2 |
4 |
ENSMMUG00000052673 |
0.505643 |
0.026039 |
103 |
544 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.711255e+00 |
2.961670e-30 |
4.503088 |
2 |
4 |
ENSMMUG00000052673 |
0.491586 |
0.027614 |
103 |
534 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 2.219197e-06 |
8.983608e-30 |
3.919719 |
2 |
6 |
ENSMMUG00000043894 |
1.483516 |
0.203118 |
103 |
158 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.759829e+00 |
1.224345e-29 |
5.504371 |
0 |
1 |
ENSMMUG00000060662 |
0.744081 |
0.024041 |
103 |
933 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 2.081256e+00 |
1.690525e-29 |
5.374644 |
0 |
1 |
ENSMMUG00000060662 |
0.730228 |
0.025594 |
103 |
928 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.165294e+00 |
1.985984e-29 |
5.238837 |
0 |
1 |
ENSMMUG00000060662 |
0.715144 |
0.027284 |
103 |
942 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.700937e-03 |
3.580677e-29 |
4.386767 |
2 |
2 |
ENSMMUG00000062085 |
1.077587 |
0.088580 |
103 |
504 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.839802e+00 |
3.827329e-29 |
5.002207 |
0 |
1 |
ENSMMUG00000060662 |
0.687489 |
0.030384 |
103 |
953 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 2.944461e-06 |
4.075794e-29 |
3.822363 |
2 |
6 |
ENSMMUG00000043894 |
1.448086 |
0.207088 |
103 |
511 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.894907e-01 |
5.332926e-29 |
4.193284 |
0 |
1 |
ENSMMUG00000057062 |
0.456475 |
0.031133 |
103 |
882 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 2.380103e-06 |
1.261190e-28 |
3.950724 |
2 |
6 |
ENSMMUG00000043894 |
1.494811 |
0.201852 |
103 |
241 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 6.128023e-02 |
2.962608e-28 |
4.347653 |
2 |
4 |
ENSMMUG00000052673 |
0.483071 |
0.030048 |
100 |
-1 |
0.0 |
0.0 |
tcr_cluster |
gex |
| 3.388499e-06 |
3.743188e-28 |
3.867162 |
2 |
6 |
ENSMMUG00000043894 |
1.464382 |
0.205262 |
103 |
959 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.067582e+00 |
1.169482e-27 |
4.376205 |
2 |
4 |
ENSMMUG00000052673 |
0.478284 |
0.029105 |
103 |
564 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.784933e+00 |
1.452422e-27 |
4.398505 |
2 |
4 |
ENSMMUG00000052673 |
0.480644 |
0.028841 |
103 |
552 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.905234e+00 |
1.585636e-27 |
4.358971 |
2 |
4 |
ENSMMUG00000052673 |
0.476454 |
0.029310 |
103 |
556 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 3.802922e+00 |
1.623164e-27 |
4.365042 |
2 |
4 |
ENSMMUG00000052673 |
0.477099 |
0.029238 |
103 |
548 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 4.367138e+00 |
1.835152e-27 |
4.304220 |
2 |
4 |
ENSMMUG00000052673 |
0.470606 |
0.029966 |
103 |
533 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 4.236369e+00 |
2.186521e-27 |
4.200551 |
2 |
4 |
ENSMMUG00000052673 |
0.459394 |
0.031222 |
103 |
563 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.182252e-05 |
4.440368e-27 |
3.855867 |
2 |
6 |
ENSMMUG00000043894 |
1.460272 |
0.205723 |
103 |
46 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 7.825501e-06 |
4.814517e-27 |
3.864251 |
2 |
6 |
ENSMMUG00000043894 |
1.463323 |
0.205381 |
103 |
963 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 8.149115e-06 |
6.506373e-27 |
3.865820 |
2 |
6 |
ENSMMUG00000043894 |
1.463894 |
0.205317 |
103 |
949 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 8.783328e-06 |
8.852021e-27 |
3.831511 |
2 |
6 |
ENSMMUG00000043894 |
1.451413 |
0.206716 |
103 |
819 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 1.614262e-05 |
9.629497e-27 |
3.799316 |
2 |
6 |
ENSMMUG00000043894 |
1.439709 |
0.208027 |
103 |
185 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 5.078142e+00 |
1.343536e-26 |
5.122255 |
0 |
1 |
ENSMMUG00000060662 |
0.701731 |
0.028788 |
103 |
944 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 8.227295e+00 |
1.876362e-26 |
4.931847 |
0 |
1 |
ENSMMUG00000060662 |
0.678946 |
0.031341 |
103 |
941 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 2.051007e-05 |
4.690201e-26 |
3.660968 |
2 |
6 |
ENSMMUG00000043894 |
1.389521 |
0.213652 |
103 |
725 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 5.344732e+00 |
3.043984e-25 |
4.389552 |
2 |
4 |
ENSMMUG00000052673 |
0.479697 |
0.028947 |
103 |
530 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 5.919114e+00 |
3.446155e-25 |
4.359832 |
2 |
4 |
ENSMMUG00000052673 |
0.476545 |
0.029300 |
103 |
528 |
0.0 |
0.1 |
tcr_nbr |
gex |
| 5.919114e+00 |
3.446155e-25 |
4.359832 |
2 |
4 |
ENSMMUG00000052673 |
0.476545 |
0.029300 |
103 |
529 |
0.0 |
0.1 |
tcr_nbr |
gex |
Omitted 238 lines
tcr_graph_vs_gex_features_plot
This plot summarizes the results of a graph
versus features analysis by labeling the clonotypes at the center of
each biased neighborhood with the name of the feature biased in that
neighborhood. The feature names are drawn in colored boxes whose
color is determined by the strength and direction of the feature score bias
(from bright red for features that are strongly elevated to bright blue
for features that are strongly decreased in the corresponding neighborhoods,
relative to the rest of the dataset).
At most one feature (the top scoring) is shown for each clonotype
(ie, neighborhood). The UMAP xy coordinates for this plot are
stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations
is 'mwu_pvalue_adj'. The threshold score for displaying a feature is
1.0. The feature column is 'feature'. Since
we also run graph-vs-features using "neighbor" graphs that are defined
by clusters, ie where each clonotype is connected to all the other
clonotypes in the same cluster, some biased features may be associated with
a cluster rather than a specific clonotype. Those features are labeled with
a '*' at the end and shown near the centroid of the clonotypes belonging
to that cluster.
Image source: tmp_rhesus_tcr_graph_vs_gex_features_plot.png
tcr_graph_vs_gex_features_panels
Graph-versus-feature analysis was used to identify
a set of GEX features that showed biased distributions
in TCR neighborhoods. This plot shows the distribution of the
top-scoring GEX features on the TCR
UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie
Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons).
At most 3 features from clonotype neighbhorhoods
in each (GEX,TCR) cluster pair are shown. The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Points are plotted in order of increasing feature score.
Image source: tmp_rhesus_tcr_graph_vs_gex_features_panels.png
tcr_genes_vs_gex_features
This table has results from a graph-vs-features analysis in which we
look for genes that are differentially expressed (elevated) in specific
neighborhoods of the TCR neighbor graph. Differential expression is
assessed by a ttest first, for speed, and then
by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value
passes an initial threshold (default is 10* the pvalue threshold).
Each row of the table represents a single significant association, in other
words a neighborhood (defined by the central clonotype index) and a
gene.
The columns are as follows:
ttest_pvalue_adj= ttest_pvalue * number of comparisons
mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons
log2enr = log2 fold change of gene in neighborhood (will be positive)
gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores
tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores
num_fg= the number of clonotypes in the neighborhood (including center)
mean_fg= the mean value of the feature in the neighborhood
mean_bg= the mean value of the feature outside the neighborhood
feature= the name of the gene
mait_fraction= the fraction of the skewed clonotypes that have an invariant
TCR
clone_index= the index in the anndata dataset of the clonotype that is the
center of the neighborhood.
In this analysis the TCR graph is defined by
connecting all clonotypes that have the same VA/JA/VB/JB-gene segment
(it's run four times, once with each gene segment type)
| ttest_pvalue_adj |
mwu_pvalue_adj |
log2enr |
gex_cluster |
tcr_cluster |
feature |
mean_fg |
mean_bg |
num_fg |
clone_index |
mait_fraction |
gene_segment |
graph_type |
feature_type |
| 7.680095e+00 |
1.487624e-218 |
NaN |
6 |
5 |
ENSMMUG00000048246 |
3.234783 |
-1.873645e-09 |
6 |
-1 |
0.000000 |
TRBV6-8 |
tcr_genes |
gex |
| 8.028887e-14 |
2.917151e-168 |
10.269545 |
0 |
1 |
ENSMMUG00000060662 |
2.728752 |
1.152940e-02 |
32 |
-1 |
0.000000 |
TRAV8-7 |
tcr_genes |
gex |
| 2.437364e-01 |
1.368213e-163 |
12.038369 |
0 |
1 |
ENSMMUG00000059234 |
2.615368 |
3.008084e-03 |
9 |
-1 |
0.000000 |
TRBV25-1 |
tcr_genes |
gex |
| 5.133844e-04 |
3.438427e-155 |
9.972508 |
6 |
1 |
ENSMMUG00000062897 |
2.403146 |
9.961364e-03 |
15 |
-1 |
0.000000 |
TRBV11-2 |
tcr_genes |
gex |
| 1.048790e-13 |
1.937740e-144 |
9.777905 |
0 |
10 |
ENSMMUG00000063185 |
2.975444 |
2.096385e-02 |
31 |
-1 |
0.000000 |
TRBV4-2 |
tcr_genes |
gex |
| 6.739993e-26 |
3.745618e-142 |
8.207480 |
0 |
3 |
ENSMMUG00000062085 |
2.643389 |
4.323633e-02 |
57 |
-1 |
0.000000 |
TRBV4-3 |
tcr_genes |
gex |
| 2.737426e-13 |
2.783030e-130 |
9.048454 |
0 |
1 |
ENSMMUG00000062211 |
3.003197 |
3.552807e-02 |
33 |
-1 |
0.000000 |
TRBV12-2 |
tcr_genes |
gex |
| 2.132582e-04 |
1.006423e-124 |
8.088103 |
2 |
4 |
ENSMMUG00000056431 |
1.916044 |
2.106862e-02 |
20 |
-1 |
0.000000 |
TRAV35 |
tcr_genes |
gex |
| 4.244478e-06 |
1.347413e-106 |
6.957203 |
2 |
4 |
ENSMMUG00000052673 |
1.189393 |
1.822269e-02 |
49 |
-1 |
0.000000 |
TRAV27 |
tcr_genes |
gex |
| 3.387262e-05 |
2.640779e-105 |
8.074643 |
1 |
1 |
ENSMMUG00000056910 |
1.911052 |
2.114114e-02 |
17 |
-1 |
0.000000 |
TRAV16 |
tcr_genes |
gex |
| 2.744935e-40 |
1.802070e-101 |
6.715584 |
2 |
6 |
ENSMMUG00000043894 |
2.955590 |
1.598175e-01 |
63 |
-1 |
0.000000 |
TRBV20-1 |
tcr_genes |
gex |
| 2.658563e-05 |
6.566327e-98 |
7.110737 |
0 |
2 |
ENSMMUG00000054409 |
1.627660 |
2.917654e-02 |
30 |
-1 |
0.000000 |
TRAV6 |
tcr_genes |
gex |
| 4.502685e-08 |
2.741429e-91 |
7.830931 |
0 |
0 |
ENSMMUG00000065017 |
2.286982 |
3.811179e-02 |
23 |
-1 |
0.000000 |
TRAV12-1 |
tcr_genes |
gex |
| 3.779473e-03 |
1.091229e-84 |
7.699737 |
0 |
4 |
ENSMMUG00000059325 |
1.992804 |
3.002155e-02 |
20 |
-1 |
0.000000 |
TRAV25 |
tcr_genes |
gex |
| 9.277174e-18 |
1.995422e-84 |
6.285125 |
0 |
10 |
ENSMMUG00000061119 |
1.994806 |
7.828967e-02 |
36 |
-1 |
0.000000 |
TRAV18 |
tcr_genes |
gex |
| 1.553028e-04 |
2.356232e-74 |
5.894151 |
0 |
1 |
ENSMMUG00000061081 |
0.880054 |
2.344868e-02 |
48 |
-1 |
0.000000 |
TRAV8-2 |
tcr_genes |
gex |
| 2.053014e-05 |
3.280283e-72 |
5.772890 |
0 |
1 |
ENSMMUG00000057062 |
0.943915 |
2.830943e-02 |
51 |
-1 |
0.000000 |
TRAV8-3 |
tcr_genes |
gex |
| 5.678434e-61 |
9.028388e-66 |
5.071222 |
0 |
3 |
ENSMMUG00000056515 |
2.987117 |
4.447157e-01 |
89 |
-1 |
0.000000 |
TRBV6-3 |
tcr_genes |
gex |
| 4.119979e-17 |
3.785737e-61 |
7.109418 |
0 |
0 |
ENSMMUG00000051385 |
3.076480 |
1.395675e-01 |
29 |
-1 |
0.000000 |
TRBV7-4 |
tcr_genes |
gex |
| 5.181498e-04 |
8.604369e-46 |
6.726140 |
2 |
2 |
ENSMMUG00000062974 |
2.156282 |
6.967033e-02 |
13 |
-1 |
0.000000 |
TRAV13-2 |
tcr_genes |
gex |
| 1.173425e-15 |
5.751559e-44 |
5.448288 |
1 |
5 |
ENSMMUG00000043894 |
2.628155 |
2.579457e-01 |
32 |
-1 |
0.000000 |
TRBV19 |
tcr_genes |
gex |
| 7.460720e-01 |
2.190617e-32 |
6.246444 |
0 |
5 |
ENSMMUG00000062211 |
2.302868 |
1.120602e-01 |
9 |
-1 |
0.000000 |
TRBV12-3 |
tcr_genes |
gex |
| 3.877101e-03 |
1.211601e-27 |
2.900490 |
0 |
3 |
ENSMMUG00000061119 |
0.593705 |
1.030727e-01 |
89 |
-1 |
0.000000 |
TRAV19 |
tcr_genes |
gex |
| 2.189019e-25 |
1.551478e-27 |
4.503213 |
0 |
1 |
ENSMMUG00000056515 |
2.908419 |
5.676314e-01 |
43 |
-1 |
0.100000 |
TRBV10-2 |
tcr_genes |
gex |
| 3.244874e-15 |
8.271428e-23 |
4.200101 |
0 |
1 |
ENSMMUG00000056515 |
2.740876 |
5.816071e-01 |
40 |
-1 |
0.000000 |
TRBV6-2 |
tcr_genes |
gex |
| 6.150524e-04 |
1.308301e-21 |
6.151809 |
0 |
1 |
ENSMMUG00000051385 |
2.778609 |
1.925398e-01 |
12 |
-1 |
0.000000 |
TRBV7-6 |
tcr_genes |
gex |
| 1.683511e+00 |
1.945236e-19 |
4.861590 |
2 |
5 |
ENSMMUG00000051385 |
1.998051 |
1.982498e-01 |
14 |
-1 |
0.000000 |
TRBV5-6 |
tcr_genes |
gex |
| 1.120149e-01 |
2.156665e-15 |
4.792311 |
1 |
1 |
ENSMMUG00000043894 |
2.362086 |
2.978226e-01 |
17 |
-1 |
0.000000 |
TRBV21-1 |
tcr_genes |
gex |
| 4.998835e-01 |
1.576433e-12 |
3.767046 |
4 |
8 |
KLRB1 |
1.498821 |
2.274150e-01 |
32 |
-1 |
1.000000 |
TRAV1-2 |
tcr_genes |
gex |
| 1.900278e-01 |
8.629206e-06 |
2.087580 |
0 |
3 |
ENSMMUG00000056515 |
1.546471 |
6.255701e-01 |
45 |
-1 |
0.000000 |
TRBV9 |
tcr_genes |
gex |
| 6.111556e-01 |
4.133253e-04 |
2.348992 |
0 |
4 |
ENSMMUG00000056515 |
1.711141 |
6.366813e-01 |
28 |
-1 |
0.000000 |
TRBV10-1 |
tcr_genes |
gex |
| 5.625136e+00 |
1.952271e-02 |
2.281442 |
4 |
8 |
IL7R |
1.939922 |
8.000069e-01 |
32 |
-1 |
1.000000 |
TRAV1-2 |
tcr_genes |
gex |
| 6.899273e+00 |
3.391942e-02 |
1.825299 |
3 |
5 |
PPDPF |
1.242306 |
5.277901e-01 |
29 |
-1 |
0.000000 |
TRBV7-4 |
tcr_genes |
gex |
| 3.581038e-01 |
2.756213e-01 |
0.929537 |
0 |
1 |
TIGAR |
1.490060 |
1.031299e+00 |
97 |
-1 |
0.041667 |
TRBJ1-2 |
tcr_genes |
gex |
| 2.382486e-03 |
4.042441e-01 |
0.416653 |
1 |
4 |
RPL35A |
3.630398 |
3.350432e+00 |
49 |
-1 |
0.000000 |
TRAV27 |
tcr_genes |
gex |
| 5.202479e-01 |
1.337438e+00 |
0.426685 |
1 |
6 |
RPL8 |
3.215522 |
2.933484e+00 |
63 |
-1 |
0.000000 |
TRBV20-1 |
tcr_genes |
gex |
| 6.754571e-01 |
4.552175e+00 |
2.381771 |
2 |
2 |
ENSMMUG00000006206 |
1.457456 |
4.899458e-01 |
8 |
-1 |
0.000000 |
TRAJ3 |
tcr_genes |
gex |
tcr_genes_vs_gex_features_panels
Graph-versus-feature analysis was used to identify
a set of GEX features that showed biased distributions
in TCR neighborhoods. This plot shows the distribution of the
top-scoring GEX features on the TCR
UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie
Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons).
At most 3 features from clonotype neighbhorhoods
in each (GEX,TCR) cluster pair are shown. The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Points are plotted in order of increasing feature score.
Image source: tmp_rhesus_tcr_genes_vs_gex_features_panels.png
gex_graph_vs_tcr_features
This table has results from a graph-vs-features analysis in which we
look at the distribution of a set of TCR-defined features over the GEX
neighbor graph. We look for neighborhoods in the graph that have biased
score distributions, as assessed by a ttest first, for speed, and then
by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value
passes an initial threshold (default is 10* the pvalue threshold).
Each row of the table represents a single significant association, in other
words a neighborhood (defined by the central clonotype index) and a
tcr feature.
The columns are as follows:
ttest_pvalue_adj= ttest_pvalue * number of comparisons
ttest_stat= ttest statistic (sign indicates where feature is up or down)
mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons
gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores
tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores
num_fg= the number of clonotypes in the neighborhood (including center)
mean_fg= the mean value of the feature in the neighborhood
mean_bg= the mean value of the feature outside the neighborhood
feature= the name of the TCR score
mait_fraction= the fraction of the skewed clonotypes that have an invariant
TCR
clone_index= the index in the anndata dataset of the clonotype that is the
center of the neighborhood.
| ttest_pvalue_adj |
ttest_stat |
mwu_pvalue_adj |
gex_cluster |
tcr_cluster |
num_fg |
mean_fg |
mean_bg |
feature |
mait_fraction |
clone_index |
nbr_frac |
graph_type |
feature_type |
| 0.002003 |
5.266544 |
8.875816e-54 |
4 |
8 |
69 |
0.289855 |
0.002099 |
mait |
1.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.000840 |
5.486600 |
1.298149e-42 |
4 |
8 |
69 |
0.318841 |
0.010493 |
TRAV1-2 |
0.882353 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.547702 |
4.922232 |
7.170575e-32 |
4 |
8 |
103 |
0.194175 |
0.002176 |
mait |
0.800000 |
24 |
0.10 |
gex_nbr |
tcr |
| 0.388360 |
5.001786 |
7.130560e-24 |
4 |
8 |
103 |
0.213592 |
0.010881 |
TRAV1-2 |
0.800000 |
24 |
0.10 |
gex_nbr |
tcr |
| 5.272281 |
4.353944 |
6.279203e-21 |
4 |
8 |
103 |
0.165049 |
0.005441 |
mait |
0.680000 |
25 |
0.10 |
gex_nbr |
tcr |
| 5.272281 |
4.353944 |
6.279203e-21 |
4 |
8 |
103 |
0.165049 |
0.005441 |
mait |
0.680000 |
41 |
0.10 |
gex_nbr |
tcr |
| 0.064964 |
4.327631 |
5.109696e-19 |
4 |
8 |
69 |
0.246377 |
0.020986 |
TRBJ2-6 |
0.705882 |
-1 |
0.00 |
gex_cluster |
tcr |
| 3.834780 |
4.433647 |
8.530963e-16 |
4 |
8 |
103 |
0.184466 |
0.014146 |
TRAV1-2 |
0.680000 |
41 |
0.10 |
gex_nbr |
tcr |
| 3.206622 |
4.478261 |
2.470043e-14 |
4 |
8 |
103 |
0.194175 |
0.018498 |
TRBJ2-6 |
0.480000 |
24 |
0.10 |
gex_nbr |
tcr |
| 8.110873 |
4.238550 |
2.078654e-13 |
4 |
8 |
103 |
0.174757 |
0.015234 |
TRAV1-2 |
0.640000 |
23 |
0.10 |
gex_nbr |
tcr |
| 0.059479 |
4.348959 |
2.510116e-13 |
4 |
8 |
69 |
0.275362 |
0.039874 |
TRAJ33 |
1.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 5.827015 |
2.882367 |
1.035680e-03 |
3 |
3 |
158 |
0.069620 |
0.010417 |
TRAV17 |
0.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.000862 |
5.026611 |
2.711672e-03 |
0 |
3 |
297 |
0.210607 |
0.055467 |
cd8 |
0.013514 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.003944 |
5.010718 |
7.681031e-03 |
4 |
1 |
69 |
78.420290 |
65.929696 |
alphadist |
0.058824 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.064001 |
-4.129941 |
5.010006e-02 |
2 |
1 |
183 |
-0.033109 |
0.129706 |
cd8 |
0.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.003420 |
-4.786657 |
9.405589e-02 |
1 |
6 |
184 |
-0.031455 |
0.129537 |
cd8 |
0.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.375490 |
-3.802146 |
1.662729e-01 |
4 |
8 |
69 |
-0.119680 |
0.436249 |
af5 |
0.764706 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.000731 |
-5.037161 |
1.812127e-01 |
0 |
5 |
297 |
0.016835 |
0.080000 |
TRBV20-1 |
0.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.277927 |
-3.889917 |
2.232103e-01 |
4 |
8 |
69 |
-0.695780 |
0.037686 |
af3 |
0.647059 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.311807 |
-3.862978 |
3.091960e-01 |
4 |
8 |
69 |
-0.246574 |
0.151507 |
kf7 |
0.705882 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.008690 |
-5.760626 |
3.366695e-01 |
1 |
6 |
103 |
-0.113480 |
0.124540 |
cd8 |
0.000000 |
593 |
0.10 |
gex_nbr |
tcr |
| 6.534299 |
2.835147 |
3.637232e-01 |
2 |
6 |
183 |
0.120219 |
0.048868 |
TRBV20-1 |
0.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.285774 |
-5.022732 |
3.672394e-01 |
2 |
6 |
103 |
-0.120508 |
0.125328 |
cd8 |
0.000000 |
163 |
0.10 |
gex_nbr |
tcr |
| 0.616313 |
4.835466 |
5.733172e-01 |
0 |
1 |
103 |
0.299643 |
0.078238 |
cd8 |
0.000000 |
20 |
0.10 |
gex_nbr |
tcr |
| 9.197958 |
-6.563406 |
5.917695e-01 |
4 |
8 |
11 |
-0.939361 |
0.136207 |
kf7 |
1.000000 |
27 |
0.01 |
gex_nbr |
tcr |
| 0.231033 |
-5.069156 |
6.242221e-01 |
2 |
6 |
103 |
-0.118154 |
0.125064 |
cd8 |
0.000000 |
1010 |
0.10 |
gex_nbr |
tcr |
| 3.020025 |
3.064586 |
7.254659e-01 |
0 |
3 |
297 |
0.134680 |
0.067586 |
TRAV19 |
0.000000 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.145216 |
-5.166565 |
8.317266e-01 |
2 |
6 |
103 |
-0.109760 |
0.124123 |
cd8 |
0.000000 |
545 |
0.10 |
gex_nbr |
tcr |
| 0.856814 |
-3.563955 |
5.661843e+00 |
4 |
8 |
69 |
0.543890 |
0.582572 |
nndists_tcr |
0.941176 |
-1 |
0.00 |
gex_cluster |
tcr |
| 0.601287 |
-4.829535 |
7.050044e+00 |
1 |
6 |
103 |
-0.081544 |
0.120961 |
cd8 |
0.000000 |
432 |
0.10 |
gex_nbr |
tcr |
| 0.276291 |
-5.017002 |
7.982202e+00 |
1 |
6 |
103 |
-0.099078 |
0.122926 |
cd8 |
0.000000 |
129 |
0.10 |
gex_nbr |
tcr |
| 0.272385 |
-5.009599 |
8.364139e+00 |
1 |
0 |
103 |
-0.085437 |
0.121397 |
cd8 |
0.000000 |
680 |
0.10 |
gex_nbr |
tcr |
gex_graph_vs_tcr_features_plot
This plot summarizes the results of a graph
versus features analysis by labeling the clonotypes at the center of
each biased neighborhood with the name of the feature biased in that
neighborhood. The feature names are drawn in colored boxes whose
color is determined by the strength and direction of the feature score bias
(from bright red for features that are strongly elevated to bright blue
for features that are strongly decreased in the corresponding neighborhoods,
relative to the rest of the dataset).
At most one feature (the top scoring) is shown for each clonotype
(ie, neighborhood). The UMAP xy coordinates for this plot are
stored in adata.obsm['X_gex_2d']. The score used for ranking correlations
is 'mwu_pvalue_adj'. The threshold score for displaying a feature is
1.0. The feature column is 'feature'. Since
we also run graph-vs-features using "neighbor" graphs that are defined
by clusters, ie where each clonotype is connected to all the other
clonotypes in the same cluster, some biased features may be associated with
a cluster rather than a specific clonotype. Those features are labeled with
a '*' at the end and shown near the centroid of the clonotypes belonging
to that cluster.
Image source: tmp_rhesus_gex_graph_vs_tcr_features_plot.png
gex_graph_vs_tcr_features_panels
Graph-versus-feature analysis was used to identify
a set of TCR features that showed biased distributions
in GEX neighborhoods. This plot shows the distribution of the
top-scoring TCR features on the GEX
UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie
Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons).
At most 3 features from clonotype neighbhorhoods
in each (GEX,TCR) cluster pair are shown. The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Points are plotted in order of increasing feature score.
Image source: tmp_rhesus_gex_graph_vs_tcr_features_panels.png
graph_vs_features_gex_clustermap
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
GEX landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_gex' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are GEX clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=102 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie GEX features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the GEX features).
Image source: tmp_rhesus_graph_vs_features_gex_clustermap.png
graph_vs_features_tcr_clustermap
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
TCR landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_tcr' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are TCR clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=102 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie TCR features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the TCR features).
Image source: tmp_rhesus_graph_vs_features_tcr_clustermap.png
graph_vs_summary
Summary figure for the graph-vs-graph and
graph-vs-features analyses.
Image source: tmp_rhesus_graph_vs_summary.png
hotspot_features
Find GEX (TCR) features that show a biased
distribution across the TCR (GEX) neighbor graph,
using a simplified version of the Hotspot method
from the Yosef lab.
DeTomaso, D., & Yosef, N. (2021).
"Hotspot identifies informative gene modules across modalities
of single-cell genomics."
Cell Systems, 12(5), 446–456.e9.
PMID:33951459
Columns:
Z: HotSpot Z statistic
pvalue_adj: Raw P value times the number of tests (crude Bonferroni
correction)
nbr_frac: The K NN nbr fraction used for the neighbor graph construction
(nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)
| Z |
pvalue_adj |
feature |
feature_type |
nbr_frac |
| 63.643690 |
0.000000e+00 |
ENSMMUG00000056515 |
gex |
0.10 |
| 59.411769 |
0.000000e+00 |
ENSMMUG00000043894 |
gex |
0.10 |
| 41.269847 |
0.000000e+00 |
ENSMMUG00000061119 |
gex |
0.10 |
| 40.066080 |
0.000000e+00 |
ENSMMUG00000062085 |
gex |
0.10 |
| 39.953858 |
0.000000e+00 |
ENSMMUG00000060662 |
gex |
0.10 |
| 39.472488 |
0.000000e+00 |
ENSMMUG00000060662 |
gex |
0.01 |
| 38.206218 |
0.000000e+00 |
ENSMMUG00000052673 |
gex |
0.10 |
| 35.967220 |
1.946855e-279 |
ENSMMUG00000043894 |
gex |
0.01 |
| 35.168258 |
4.370481e-267 |
ENSMMUG00000056515 |
gex |
0.01 |
| 33.336175 |
8.259764e-240 |
ENSMMUG00000057062 |
gex |
0.10 |
| 29.458605 |
6.982233e-187 |
ENSMMUG00000061119 |
gex |
0.01 |
| 26.095667 |
2.918464e-146 |
ENSMMUG00000061081 |
gex |
0.10 |
| 25.196685 |
3.113142e-136 |
ENSMMUG00000062085 |
gex |
0.01 |
| 25.093568 |
4.178786e-135 |
ENSMMUG00000062211 |
gex |
0.10 |
| 24.363803 |
3.601598e-129 |
mait |
tcr |
0.10 |
| 24.372706 |
2.381113e-127 |
ENSMMUG00000057062 |
gex |
0.01 |
| 23.428392 |
1.569420e-117 |
ENSMMUG00000054409 |
gex |
0.10 |
| 22.289899 |
4.028425e-108 |
mait |
tcr |
0.01 |
| 21.784105 |
2.346674e-101 |
ENSMMUG00000052673 |
gex |
0.01 |
| 21.616693 |
8.944171e-100 |
ENSMMUG00000065017 |
gex |
0.10 |
| 20.803983 |
2.847160e-92 |
ENSMMUG00000054409 |
gex |
0.01 |
| 20.422579 |
7.530645e-89 |
CEBPD |
gex |
0.01 |
| 19.385046 |
7.377658e-80 |
gex_cluster4 |
gex |
0.01 |
| 18.691667 |
4.137002e-74 |
ENSMMUG00000056431 |
gex |
0.10 |
| 17.915819 |
6.344050e-68 |
ENSMMUG00000065017 |
gex |
0.01 |
| 17.755709 |
1.112891e-66 |
ENSMMUG00000056431 |
gex |
0.01 |
| 17.437876 |
3.042184e-64 |
ENSMMUG00000061081 |
gex |
0.01 |
| 16.447369 |
7.620416e-59 |
TRAV1-2 |
tcr |
0.10 |
| 16.591287 |
5.759498e-58 |
ENSMMUG00000063185 |
gex |
0.10 |
| 15.681633 |
1.757348e-53 |
TRAV1-2 |
tcr |
0.01 |
| 15.085948 |
1.432285e-47 |
ENSMMUG00000062211 |
gex |
0.01 |
| 14.305926 |
1.436347e-42 |
ENSMMUG00000051385 |
gex |
0.10 |
| 13.955132 |
2.092637e-40 |
ENSMMUG00000059325 |
gex |
0.10 |
| 13.575538 |
4.866011e-40 |
cd8 |
tcr |
0.10 |
| 13.766224 |
2.908768e-39 |
CEBPD |
gex |
0.10 |
| 13.386066 |
5.213968e-37 |
gex_cluster4 |
gex |
0.10 |
| 12.648722 |
8.127933e-33 |
ENSMMUG00000003532 |
gex |
0.10 |
| 12.609744 |
1.333800e-32 |
ARHGAP8 |
gex |
0.01 |
| 11.461007 |
1.480295e-26 |
ENSMMUG00000063185 |
gex |
0.01 |
| 11.332531 |
6.472525e-26 |
ENSMMUG00000059325 |
gex |
0.01 |
| 10.567272 |
3.020817e-22 |
gex_cluster0 |
gex |
0.10 |
| 10.134396 |
2.778910e-20 |
ENSMMUG00000051385 |
gex |
0.01 |
| 9.463995 |
2.578652e-19 |
TRBJ2-6 |
tcr |
0.10 |
| 9.711782 |
1.919955e-18 |
KLRB1 |
gex |
0.01 |
| 9.238144 |
2.182051e-18 |
tcr_cluster8 |
tcr |
0.10 |
| 8.694954 |
3.018653e-16 |
tcr_cluster6 |
tcr |
0.10 |
| 8.612584 |
6.214564e-16 |
tcr_cluster8 |
tcr |
0.01 |
| 8.903537 |
3.866700e-15 |
ENSMMUG00000056910 |
gex |
0.10 |
| 8.841629 |
6.742919e-15 |
KLRG1 |
gex |
0.01 |
| 8.226785 |
1.673060e-14 |
TRBV20-1 |
tcr |
0.10 |
Omitted 79 lines
hotspot_gex_umap
HotSpot analysis (Nir Yosef lab, PMID: 33951459)
was used to identify a set of GEX (TCR) features that showed biased
distributions in TCR (GEX) space. This plot shows the distribution of the
top-scoring HotSpot features on the GEX
UMAP 2D landscape. The features are ranked by adjusted P value
(raw P value * number of comparisons). The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Features are filtered based on correlation coefficient to reduce
redundancy: if a feature has a correlation of >= 0.9
(the max_feature_correlation argument to conga.plotting.plot_hotspot_umap)
to a previously plotted feature, that feature is skipped.
Points are plotted in order of increasing feature score
Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png
hotspot_gex_clustermap
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
GEX landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_gex' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are GEX clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=102 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie GEX features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the GEX features).
Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png
hotspot_tcr_umap
HotSpot analysis (Nir Yosef lab, PMID: 33951459)
was used to identify a set of GEX (TCR) features that showed biased
distributions in TCR (GEX) space. This plot shows the distribution of the
top-scoring HotSpot features on the TCR
UMAP 2D landscape. The features are ranked by adjusted P value
(raw P value * number of comparisons). The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Features are filtered based on correlation coefficient to reduce
redundancy: if a feature has a correlation of >= 0.9
(the max_feature_correlation argument to conga.plotting.plot_hotspot_umap)
to a previously plotted feature, that feature is skipped.
Points are plotted in order of increasing feature score
Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png
hotspot_tcr_clustermap
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
TCR landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_tcr' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are TCR clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=102 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie TCR features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the TCR features).
Image source: tmp_rhesus_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png